Variations of k-mean Algorithm: A Study for High-Dimensional Large Data Sets
نویسندگان
چکیده
منابع مشابه
Efficient Computation of k-Nearest Neighbour Graphs for Large High-Dimensional Data Sets on GPU Clusters
This paper presents an implementation of the brute-force exact k-Nearest Neighbor Graph (k-NNG) construction for ultra-large high-dimensional data cloud. The proposed method uses Graphics Processing Units (GPUs) and is scalable with multi-levels of parallelism (between nodes of a cluster, between different GPUs on a single node, and within a GPU). The method is applicable to homogeneous computi...
متن کاملFeature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach
Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...
متن کاملAn approximate algorithm for top-k closest pairs join query in large high dimensional data
In this paper we present a novel approximate algorithm to calculate the top-k closest pairs join query of two large and high dimensional data sets. The algorithm has worst case time complexity OðdnkÞ and space complexity OðndÞ and guarantees a solution within a Oðd1þ1tÞ factor of the exact one, where t 2 {1,2, . . . ,1} denotes the Minkowski metrics Lt of interest and d the dimensionality. It m...
متن کاملO-Cluster: Scalable Clustering of Large High Dimensional Data Sets
Clustering large data sets of high dimensionality has always been a serious challenge for clustering algorithms. Many recently developed clustering algorithms have attempted to address either handling data sets with very large number of records or data sets with very high number of dimensions. This paper provides a discussion of the advantages and limitations of existing algorithms when they op...
متن کاملClustering of Data Using K-Mean Algorithm
Clustering is associate automatic learning technique geared toward grouping a collection of objects into subsets or clusters. The goal is to form clusters that are coherent internally, however well completely different from one another. In plain words, objects within the same cluster ought to be as similar as potential, whereas objects in one cluster ought to be as dissimilar as potential from ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information Technology Journal
سال: 2006
ISSN: 1812-5638
DOI: 10.3923/itj.2006.1132.1135